Richard Erickson
1 May 2019
U.S. Department of the Interior
U.S. Geological Survey
lmer() and glmer() functionslme4 lme4
Explicitly modeling uncertainty
Image source: Library of Congress
lm()forumla =data =lm(formula = y~ x, data = dat)
lm(y ~ x1 + x2, data = dat)
x1 = c(1.2, 0.3, -1.0, 0.1)0s and 1s to code for membership in a groupx2 = c(0, 1, 1, 0)
formula = y ~ 1 + x2 formula = y ~ x2)formula = y ~ x2 - 1
df = data.frame(length = c(1, 2, 3),
fish = c("red", "blue", "red"))
print(df)
length fish
1 1 red
2 2 blue
3 3 red
model.matrix( ~ length + fish, data = df)
(Intercept) length fishred
1 1 1 1
2 1 2 0
3 1 3 1
attr(,"assign")
[1] 0 1 2
attr(,"contrasts")
attr(,"contrasts")$fish
[1] "contr.treatment"
model.matrix( ~ length + fish - 1, data = df)
length fishblue fishred
1 1 0 1
2 2 1 0
3 3 0 1
attr(,"assign")
[1] 1 2 2
attr(,"contrasts")
attr(,"contrasts")$fish
[1] "contr.treatment"
glimpse(fishes)
Observations: 20
Variables: 2
$ length <dbl> 8.792934, 10.277429, 11.084441, 7.654302, 10.429125, 10.5…
$ fish <chr> "red", "red", "red", "red", "red", "red", "red", "red", "…
fish_summary <- fishes %>% group_by(fish) %>% summarize(mean(length))
fish_summary
# A tibble: 2 x 2
fish `mean(length)`
<chr> <dbl>
1 blue 4.88
2 red 9.62
fish_lm <- lm(length ~ fish, data = fishes)
print(coef(fish_lm), digits = 3)
(Intercept) fishred
4.88 4.74
print(fish_summary[2,2] - fish_summary[1,2], digits = 3)
mean(length)
1 4.74
fish_lm <- lm(length ~ fish - 1, data = fishes)
print(coef(fish_lm), digits = 3)
fishblue fishred
4.88 9.62
print(fish_summary)
# A tibble: 2 x 2
fish `mean(length)`
<chr> <dbl>
1 blue 4.88
2 red 9.62
Multiple fish per tank?
Multiple definitions (see Gelman's blog post)
lmer())
library(tidyverse)
lamprey <- read_csv("./data/lamprey_adult_lab_DNA.csv")
head(lamprey)
# A tibble: 6 x 6
Sample Fluor Copies Inhibited tank stock
<chr> <chr> <dbl> <chr> <chr> <chr>
1 200L-2A FAM 1336000 NO 2A 200L
2 200L-2B FAM 1099000 NO 2B 200L
3 200L-2C FAM 1229000 NO 2C 200L
4 200L-3A FAM 1827000 NO 3A 200L
5 200L-3B FAM 1758000 NO 3B 200L
6 200L-3C FAM 1543000 NO 3C 200L
lamprey <- lamprey %>%
mutate(Copies_2 = ifelse(is.na(Copies), log10(0 + 1),
log10(Copies + 1)),
stock = factor(stock,
levels = c("0L", "2L", "20L", "200L")))
lamprey %>%
group_by(tank, stock) %>%
summarise(n()) %>% print(n = 4)
# A tibble: 36 x 3
# Groups: tank [9]
tank stock `n()`
<chr> <fct> <int>
1 1A 0L 4
2 1A 2L 4
3 1A 20L 4
4 1A 200L 4
# … with 32 more rows
lmer() functionlm()
formula =data =
Optional, mainly debugging
REML =
TRUE: Restricted maximum-likelihood, better for random-effect variancesFALSE: Maximum-likelihood, better for fixed-effect estimatescontrol = List of optional control settingsstart Initial values for numerical model fitting
Random-intercept:
y ~ 1 + (1|group)
Correlated random intercept and slope:
y ~ x + (x|group)
Uncorrelated random intercept and slope:
y ~ x + (x||group)
lamprey_lmer <-
lmer(formula = Copies_2 ~ stock + (1|Sample),
data = lamprey)
print(lamprey_lmer)
Linear mixed model fit by REML ['lmerMod']
Formula: Copies_2 ~ stock + (1 | Sample)
Data: lamprey
REML criterion at convergence: 217.4477
Random effects:
Groups Name Std.Dev.
Sample (Intercept) 0.1556
Residual 0.4802
Number of obs: 144, groups: Sample, 36
Fixed Effects:
(Intercept) stock2L stock20L stock200L
0.2576 3.3355 4.6871 5.9000
summary(lamprey_lmer)
Linear mixed model fit by REML ['lmerMod']
Formula: Copies_2 ~ stock + (1 | tank)
Data: lamprey
REML criterion at convergence: 219.1
Scaled residuals:
Min 1Q Median 3Q Max
-9.8295 -0.4390 0.0487 0.3301 1.3166
Random effects:
Groups Name Variance Std.Dev.
tank (Intercept) 0.0005161 0.02272
Residual 0.2522290 0.50222
Number of obs: 144, groups: tank, 9
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.25765 0.08405 3.066
stock2L 3.33548 0.11838 28.177
stock20L 4.68710 0.11838 39.595
stock200L 5.89999 0.11838 49.841
Correlation of Fixed Effects:
(Intr) stck2L stc20L
stock2L -0.704
stock20L -0.704 0.500
stock200L -0.704 0.500 0.500
fixef(lamprey_lmer)
(Intercept) stock2L stock20L stock200L
0.2576485 3.3354850 4.6870951 5.8999891
ranef(lamprey_lmer)
$Sample
(Intercept)
0L-1A -0.015742162
0L-1B 0.007779585
0L-1C -0.033588715
0L-2A 0.019896401
0L-2B -0.076205009
0L-2C -0.017399794
0L-3A -0.012469836
0L-3B 0.110969618
0L-3C 0.016759912
200L-1A -0.012940778
200L-1B 0.016579635
200L-1C 0.028812706
200L-2A -0.046767840
200L-2B -0.037618547
200L-2C -0.025373925
200L-3A 0.035782614
200L-3B 0.026358510
200L-3C 0.015167625
20L-1A 0.128717239
20L-1B 0.109645108
20L-1C -0.280753022
20L-2A 0.035313520
20L-2B 0.070336697
20L-2C 0.023047482
20L-3A -0.085433403
20L-3B 0.011312673
20L-3C -0.012186294
2L-1A 0.001890295
2L-1B 0.036117898
2L-1C -0.017872019
2L-2A -0.086830061
2L-2B 0.166524703
2L-2C 0.139901362
2L-3A -0.052233659
2L-3B -0.127105617
2L-3C -0.060392901
confint(lamprey_lmer)
2.5 % 97.5 %
.sig01 0.00000000 0.2530644
.sigma 0.42261598 0.5516767
(Intercept) 0.07661855 0.4386785
stock2L 3.07946996 3.5915000
stock20L 4.43108011 4.9431102
stock200L 5.64397408 6.1560041
predict(lamprey_lmer) %>% head()
1 2 3 4 5 6
6.110870 6.120019 6.132264 6.193420 6.183996 6.172805
“Users are often surprised and alarmed that the summary of a linear mixed model fit by lmer provides estimates of the fixed-effects parameters, standard errors for these parameters and a t-ratio but no p-values…” Doug Bates on R-help
Add-on packages exist (e.g,. lmerTest)
data(sleepstudy)
sleep_lmer <-
lmer(Reaction ~ Days + (Days|Subject),
data = sleepstudy)
summary(sleep_lmer)
Linear mixed model fit by REML ['lmerMod']
Formula: Reaction ~ Days + (Days | Subject)
Data: sleepstudy
REML criterion at convergence: 1743.6
Scaled residuals:
Min 1Q Median 3Q Max
-3.9536 -0.4634 0.0231 0.4634 5.1793
Random effects:
Groups Name Variance Std.Dev. Corr
Subject (Intercept) 612.09 24.740
Days 35.07 5.922 0.07
Residual 654.94 25.592
Number of obs: 180, groups: Subject, 18
Fixed effects:
Estimate Std. Error t value
(Intercept) 251.405 6.825 36.838
Days 10.467 1.546 6.771
Correlation of Fixed Effects:
(Intr)
Days -0.138
fixef()ranef()fixef() to ranef()
subject_fixed <- matrix(rep(as.vector(fixef(sleep_lmer)),
times = sleepstudy%>%pull(Subject) %>%
unique() %>%
length()),
ncol = 2, byrow = TRUE)
subject_coef <- ranef(sleep_lmer)$Subject + subject_fixed
subject_coef %>% head(.)
(Intercept) Days
308 253.6637 19.666258
309 211.0065 1.847583
310 212.4449 5.018406
330 275.0956 5.652955
331 273.6653 7.397391
332 260.4446 10.195115
sleep_lm <- lm(Reaction ~ Days * Subject - 1,
data = sleepstudy)
lm_coef <- tibble(
name = names(coef(sleep_lm)[2:19]),
intercept = coef(sleep_lm)[2:19],
slope = c(coef(sleep_lm)[1],
coef(sleep_lm)[1] +
coef(sleep_lm)[20:36]))
lm_coef %>% head()
# A tibble: 6 x 3
name intercept slope
<chr> <dbl> <dbl>
1 Subject308 244. 21.8
2 Subject309 205. 2.26
3 Subject310 203. 6.11
4 Subject330 290. 3.01
5 Subject331 286. 5.27
6 Subject332 264. 9.57
subject_coef %>% head(.)
(Intercept) Days
308 253.6637 19.666258
309 211.0065 1.847583
310 212.4449 5.018406
330 275.0956 5.652955
331 273.6653 7.397391
332 260.4446 10.195115
Reaction Days Subject predict
1 249.5600 0 308 253.6637
2 258.7047 1 308 273.3299
3 250.8006 2 308 292.9962
4 321.4398 3 308 312.6624
5 356.8519 4 308 332.3287
6 414.6901 5 308 351.9950
sleee_2 <- ggplot(sleepstudy, aes(x = Days, y = Reaction)) +
geom_point() + facet_wrap(~ Subject, nrow = 2) +
stat_smooth(method = 'lm', se = FALSE, size = 2) +
geom_line(aes(x = Days, y = predict), color = 'red', size = 2)
ggsave("./images/sleep_2.jpg", width = 8, height = 4)
lm()family = ...
formula =…data =…family =…glm( y ~ x, data = dat, family = 'guassian')
?family
...
binomial(link = "logit")
gaussian(link = "identity")
Gamma(link = "inverse")
inverse.gaussian(link = "1/mu^2")
poisson(link = "log")
quasi(link = "identity", variance = "constant")
quasibinomial(link = "logit")
quasipoisson(link = "log")
...
binomial(link ="logit")binomial(link ="probit")
lmer(), but with family = ...gaussian family, use lmer()
library(tidyverse)
lamprey_juv <- read_csv("./data/lamprey_lab_DNA.csv")
lamprey_juv %>% head()
# A tibble: 6 x 7
Sample Fluor Copies Inhibited tank DNA Detect
<chr> <chr> <dbl> <chr> <chr> <dbl> <dbl>
1 tank1_0 FAM NA NO tank1 0 0
2 tank2_0 FAM 4.24 NO tank2 0 1
3 tank3_0 FAM NA NO tank3 0 0
4 tank4_1 FAM NA NO tank4 1 0
5 tank5_1 FAM 54.6 NO tank5 1 1
6 tank6_1 FAM NA NO tank6 1 0
lamprey_juv_plot <-
ggplot(lamprey_juv, aes(x = factor(DNA), y = Detect)) +
geom_jitter(width = 0.1, height = 0) +
stat_summary( color = 'red') +
ylab("Probability of detection") +
xlab("Lamprey stocking level") +
theme_bw()
ggsave("./images/lamprey_juv.jpg", width = 6, height = 4)
library(lme4)
glmer_lamprey_juv <-
glmer(Detect ~ factor(DNA) + Fluor + (1|tank),
data = lamprey_juv,
family = binomial("logit"))
print(glmer_lamprey_juv)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: Detect ~ factor(DNA) + Fluor + (1 | tank)
Data: lamprey_juv
AIC BIC logLik deviance df.resid
101.1087 116.4947 -44.5543 89.1087 90
Random effects:
Groups Name Std.Dev.
tank (Intercept) 0.7125
Number of obs: 96, groups: tank, 12
Fixed Effects:
(Intercept) factor(DNA)1 factor(DNA)5 factor(DNA)25 FluorHEX
-1.5777 2.2944 3.2566 4.0779 0.1499
summary(glmer_lamprey_juv)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: Detect ~ factor(DNA) + Fluor + (1 | tank)
Data: lamprey_juv
AIC BIC logLik deviance df.resid
101.1 116.5 -44.6 89.1 90
Scaled residuals:
Min 1Q Median 3Q Max
-3.4549 -0.4221 0.3007 0.4578 2.3693
Random effects:
Groups Name Variance Std.Dev.
tank (Intercept) 0.5076 0.7125
Number of obs: 96, groups: tank, 12
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.5777 0.7622 -2.070 0.038451 *
factor(DNA)1 2.2944 0.9703 2.365 0.018048 *
factor(DNA)5 3.2566 1.0384 3.136 0.001712 **
factor(DNA)25 4.0779 1.1652 3.500 0.000466 ***
FluorHEX 0.1499 0.5482 0.274 0.784453
Correlation of Fixed Effects:
(Intr) f(DNA)1 f(DNA)5 f(DNA)2
factr(DNA)1 -0.713
factr(DNA)5 -0.677 0.564
fctr(DNA)25 -0.613 0.514 0.491
FluorHEX -0.379 0.024 0.029 0.030
lme4
Q: How to quantify uncertainty around “random-effects”?
A: Bootstrap/randomization, or, move to Bayesian methods
formula = y ~ x + (1|school) + (1|student)school ~ school funding + building condition
student ~ parent education + school
Goal: Build population model to understand and compare management approaches
On GitHub
model {
to_vector(z) ~ normal(0, 1);
L_Omega ~ lkj_corr_cholesky(2);
to_vector(gamma) ~ normal(0, 5);
y ~ normal(rows_dot_product(beta[jj] , x), sigma);
}
lme4lme4 documentation